Search CORE

10 research outputs found

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Author: Brazdil Tomas
Chatterjee Krishnendu
Novotny Petr
Vahala Jiri
Publication venue
Publication date: 27/02/2020
Field of study

Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.Comment: Published on AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Verification of Open Interactive Markov Chains

Author: Brazdil Tomas
Hermanns Holger
Krcal Jan
Kretinsky Jan
Rehak Vojtech
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)
Publication date: 01/01/2012
Field of study

Interactive Markov chains (IMC) are compositional behavioral models extending both labeled transition systems and continuous-time Markov chains. IMC pair modeling convenience - owed to compositionality properties - with effective verification algorithms and tools - owed to Markov properties. Thus far however, IMC verification did not consider compositionality properties, but considered closed systems. This paper discusses the evaluation of IMC in an open and thus compositional interpretation. For this we embed the IMC into a game that is played with the environment. We devise algorithms that enable us to derive bounds on reachability probabilities that are assured to hold in any composition context

Dagstuhl Research Online Publication Server

Verification of Probabilistic Recursive Sequential Programs

Author: Tomas Brazdil
Publication venue
Publication date
Field of study

CiteSeerX

Analysis and Prediction of the Long-Run Behavior of Probabilistic Sequential Programs with Recursion (Extended Abstract)

Author: Tomas Brazdil et al.
Publication venue
Publication date
Field of study

We introduce a family of long-run average properties of Markov chains that are useful for purposes of performance and reliability analysis, and show that these properties can effectively be checked for a subclass of infinite-state Markov chains generated by probabilistic programs with recursive procedures. We also show how to predict these properties by analyzing finite prefixes of runs, and present an efficient prediction algorithm for the mentioned subclass of Markov chains

CiteSeerX

On the Memory Consumption of Probabilistic Pushdown Automata

Author: Javier Esparza
Stefan Kiefer
Tomas Brazdil
Publication venue
Publication date: 01/01/2009
Field of study

We investigate the problem of evaluating memory consumption for systems modelled by probabilistic pushdown automata (pPDA). The space needed by a runof a pPDA is the maximal height reached by the stack during the run. Theproblem is motivated by the investigation of depth-first computations that playan important role for space-efficient schedulings of multithreaded programs. We study the computation of both the distribution of the memory consumption and its expectation. For the distribution, we show that a naive method incurs anexponential blow-up, and that it can be avoided using linear equation systems.We also suggest a possibly even faster approximation method.Given~

varepsilon>0

, these methods allow to compute bounds on the memoryconsumption that are exceeded with a probability of at most~

varepsilon

. For the expected memory consumption, we show that whether it is infinite can be decided in polynomial time for stateless pPDA (pBPA) and in polynomial space for pPDA. We also provide an iterative method for approximating theexpectation. We show how to compute error bounds of our approximation methodand analyze its convergence speed. We prove that our method convergeslinearly, i.e., the number of accurate bits of the approximation is a linear function of the number of iterations

CiteSeerX

Dagstuhl Research Online Publication Server